Unsupervised Learning of Stereo Vision with Monocular Cues
نویسندگان
چکیده
We demonstrate unsupervised learning of a stereo vision model involving monocular depth cues (shape from texture cues). We formulate a conditional probability model defining the probability of the right image given the left. This conditional model does not model a probability distribution over images. Maximizing conditional liklihood rather than joint liklihood is similar using a CRF (Conditional Random Field, [6]) rather than an MRF (joint Markov Random Field). The most closely related earlier work seems to be that of Zhang and Seitz [8] who give a method for adapting five parameters of a stereo vision model. In contrast we train highly parameterized monocular depth cues. Also, we avoid the need for independence assumptions through the use of contrastive divergence training — a general method for optimizing CRFs [4]. There is also related work by Saxena et al. on supervised learning of highly parameterized monocular depth cues [1, 2]. Unlike Saxena et al. we train monocular depth cues as part of unsupervised training of a stereo algorithm. Other related work includes that of Scharstein and Pal [7] and Kong and Tao [5] who perform supervised training of stereo algorithms using general CRF methods. We focus on histogram of oriented gradient (HOG) features as a (texture) surface orientation cue. As a surface is tilted away from the camera the edges in the direction of the tilt become foreshortened while the edges orthogonal to the tilt are not. The effect on the edge distribution is shown in the image below where the average HOG feature is shown for regions of tree trunk and forest floor. The cylindrical shape of the tree trunk is clearly indicated by the warping of the HOG feature.
منابع مشابه
A Machine Learning Approach to Recovery of Scene Geometry from Images
Recovering the 3D structure of the scene from images yields useful information for tasks such as shape and scene recognition, object detection, or motion planning and object grasping in robotics. In this thesis, we introduce a general machine learning approach called unsupervised CRF learning based on maximizing the conditional likelihood. We describe the application of our machine learning app...
متن کاملDepth Estimation Using Monocular and Stereo Cues
Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues— such as texture variations and gradients, defocus, color/haze, etc.—that have heretofore been little exploited in such systems. Some of these cues apply even...
متن کاملPersistent self-supervised learning principle: from stereo to monocular vision for obstacle avoidance
Self-Supervised Learning (SSL) is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in SSL how a robot’s learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persiste...
متن کاملRecovering stereo vision by squashing virtual bugs in a virtual reality environment.
Stereopsis is the rich impression of three-dimensionality, based on binocular disparity-the differences between the two retinal images of the same world. However, a substantial proportion of the population is stereo-deficient, and relies mostly on monocular cues to judge the relative depth or distance of objects in the environment. Here we trained adults who were stereo blind or stereo-deficien...
متن کاملExtracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images
This work combines two active areas of research in computer vision: unsupervised object extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsupervised object extraction is to exploit so-called “3D scene-consistency”, that is enforcing that objects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009